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Abstract: Most state-of-the art approaches for securing XML documents allow users to access 
data only through authorized views defined by annotating an XML grammar (e.g. DTD) with a 
collection of XPath expressions. To prevent improper disclosure of confidential information, user 
queries posed on these views need to be rewritten into equivalent queries on the underlying docu- 
ments. This rewriting enables us to avoid the overhead of view materialization and maintenance. 
A major concern here is that query rewriting for recursive XML views is still an open problem. 
To overcome this problem, some works have been proposed to translate XPath queries into non- 
standard ones, called Regular XPath queries. However, query rewriting under Regular XPath can 
be of exponential size as it relies on automaton model. Most importantly. Regular XPath remains a 
theoretical achievement. Indeed, it is not commonly used in practice as translation and evaluation 
tools are not available. In this paper, we show that query rewriting is always possible for recursive 
XML views using only the expressive power of the standard XPath. We investigate the extension 
of the downward class of XPath, composed only by child and descendant axes, with some axes 
and operators and we propose a general approach to rewrite queries under recursive XML views. 
Unlike Regular XPath-based works, we provide a rewriting algorithm which processes the query 
only over the annotated DTD grammar and which can run in linear time in the size of the query. 
An experimental evaluation demonstrates that our algorithm is efficient and scales well. 

Key-words: Queries Rewriting, XML Access control, XML views, XPath. 
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Interrogation Securisee des Vues XML 
Recur si ves: 

Une Technique basee sur le Standard XPath 

Resume : La plupart des travaux existant autour du controle d'acces des 

documents XML sc basent sur la definition d'une vue pour chaquc utilisateur 
qui represente les parties des donnees dont il est autorise a lire et/ou modifier. 
Cette vue est le resultat de rannotation de la grammaire associee au document 
XML (par exemple une DTD) par difFcrentes conditions d'acces exprimces sous 
forme d'expressions XPath. Pour empecher I'acces a des donnees confidentielles 
- cachees par la vue -, chaque requete posee par I'utilisateur sur la vue doit 
etre reecrite pour qu'elle soit evaluee en toute securite sur le document original. 
Cette reecriture permet d'eviter le cout de la materialisation et de la mainte- 
nance de la vue. Cependant, la reecriture des requetes XPath dans le cas des 
vues XML recursives reste un probleme ouvert. Pour pallier a ce probleme, cer- 
tains travaux ont propose de travailler avec un langage de requetes non-standard, 
appele "Regular XPath". Neanmoins, le langage "Regular XPath" reste au stade 
theorique car aucun outil d'cvaluation n'est disponible en pratique. Une imple- 
mentation de ce langage est basee sur les automates, ce qui pent engendrer une 
complexite de reecriture exponentielle. 

Dans ce papier, nous montrons que la reecriture des requetes XPath dans le 
cas des vues XML recursives est possible sans passer par des transformations 
vers d'autres langages (tel que "Regular XPath"), et peut etre faite en temps 
lineaire. Nous ctudions I'extension du fragment XPath, appele en anglais down- 
ward class (compose seulement par les axes child et descendant), par certains 
axes et operateurs XPath. En nous basant sur cette extension, nous proposons 
un modele general pour rcecrire des requetes XPath pour des vues XML arbi- 
traires, recursives ou non. Une phase d'experimentation montre bien I'eSicacite 
ainsi que le passage a I'echelle de notre algorithme de reecriture. 

Mots-cles : Reecriture des requetes, Controle d'acces pour documents XML, 
Vues XML, XPath. 
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1 Introduction 

XML has become the standard of representation and exchange of data across 
the web. With this emergence, a challenge is raised with regards to the security 
of XML documents whose content is available to one or more users based on 
their access privileges. First access control models for securing XML have been 
proposed in [7 10 16 . However, these models suffer from various limitations. 



They can cause leakage of sensitive information |7| , focus on the annotation of 
the entire XML data to deal with the static analysis limitations [lO], or based 
on costly schemes for rewriting user queries [16] . 

To avoid these problems, the notion of XML security views was studied by 
Fan et al. ji] . We briefly review the main principle of the XML security view- 
based approaches. We start by the fragment of XPath, called downward class, 
where the axes are limited to child and descendant axes. We use this fragment 
since it is commonly used in practice. A conform XML document T w.r.t a 
DTD D (i.e. T is an instance of D), can be queried simultaneously by different 
users. For each class of users a security view is defined by annotating D with 
some access conditions to specify the (in) accessible element types of the DTD. 
The annotated version of D is later sanitized by removing the inaccessible ele- 
ment types which results in a DTD view Dy. Then the security view is defined 
as V=(Dy,a) where D„ is given to the users, which describes accessible data 
they are able to see, and ct is a function used to extract for each XML document 
T conforms to D, its view Ty representing only authorized data of the users. 
Each query over the view is translated into an equivalent one in order to be 
evaluated over the original data T. 



Problem Statement. The problem of XPath queries rewriting studied in this 
paper is defined as follows: 

Given a DTD D. an XML security view V={Dy,a), and an XPath query 
Q over Dy . The rewriting problem consists in defining a rewriting function 
TZ that computes another XPath query TZ{Q) over the original document 
D such that: for any instance T of D and its view T„ computed w.r.t V, 
the evaluation of Q on T!„ yields the same result as the evaluation of TZ{Q) 
on T. 

Most of the security view-based approaches of XPath queries rewriting deal 
only with non-recursive DTDs |4j[9j|TT]. A DTD is recursive iff at least one 
of its elements is defined (directly or indirectly) in terms of itself. Note that 
recursive DTDs often arise when specifying medicals data and the problem of 
query rewriting is more intriguing in this case. A security view is recursive, if 
its view Dy is recursive. 

For each pair of element types A and B in the DTD, the a function is an 
XPath expression denoting the set of paths to reach an element B from an 
element A in the DTD view (where some element types are hidden between A 
and B and generated by a). However, in the case of recursive DTDs, the a 
function is not computable since there may be an infinite set of paths from A to 
B and the notion of security view (as defined above) cannot be used in queries 
rewriting. That is why some authors [sjjs] resort to the use of Regular XPath 
to avoid this problem. However, Regular XPath remains of theoretical use since 
no evaluation tools have been provided for practical use of this langage. 
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To the best of our knowledge, no practical approach exists for answering 
queries under recursive XML security views. Accordingly, the XPath query 
rewriting remains an open issue. 

Contribution. Our main contribution is making possible the query rewriting 
for recursive XML security views using only the expressive power of the standard 
XPath. We show that extending the downward class of XPath queries with some 
axes and operators is sufficient to deal with the query rewriting under recursion 
(without the need of the Kleene star or the translation from XPath to Regular 
XPath). 

Intuitively, for a query Q based on the downward class of XPath, our rewrit- 
ing solution consists in computing another query Q' = TZ{Q) using an extended 
fragment of XPath in such a way for any instance T of D and its view T„ (com- 
puted w.r.t Dy) the evaluation of Q on Ty gives the same result than evaluating 
g' on T. 

We provide a linear rewriting algorithm for arbitrary views (recursive or 
not) which, unlike Regular XPath-based works (relying on Mixed Finite Au- 
tomata 1^), consists only in processing the query over the annotated DTD to 
produce the equivalent query on any valid instance of the original DTD. We val- 
idate our solution with a performance evaluation which shows that our rewriting 
algorithm is efficient and scales well. Lastly, we show how our proposed solution 
can be extended to deal with a large fragment of XPath (including upward-axes) 
and to go beyond some limitations of existing access control specification lan- 
guages. 

Related Work. We briefly discuss two approaches of access control policy 
enforcement for XML documents with or without XML grammar. 

In [7] authors propose a formal model to specify access control for XML 
documents independently of the DTD. The policy rules definition is based on 
XPath and each query is rewritten by adding a predicate access (which rep- 
resents all accessible data) to that query. We use the same predicate access 
principle in our rewriting approach. However, inference of sensitive information 
can be detected since only the last subquery is controlled among all subqueries 
parsed by the query. To overcome this problem, we improve the method given 
in j?] by attaching a predicate access to each entity (element/attribute) parsed 
by the query. 



Vercammen 16 proposes a new method based on the intersection and union 
of XPath queries to avoid the problem of information leakage. The policy rules 
axe translated to a single query which stands for all accessible data; this query 
is incorporated by intersection with each query requested over the user XML 
document view. However, this approach yields the same performance than the 
materialization of this view. 

Other access control approaches are based on the notion of security views and 
the query rewriting principle. Fan et al. |4| propose the notion of security view 
by the annotation of a regular non-recursive DTD. The use of only downward 
class of XPath queries allowed them to achieve more precise query rewriting, 
i.e. computing all possible paths connecting each two adjacent elements in the 
query, which provides practically performance gains for the query evaluation. 
A view derivation algorithm is proposed to compute the DTD view, w.r.t. the 
access conditions, and an optimization step is also done over the rewritten query. 
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However, to keep the DTD view regular, an inaccessible element may be replaced 
with anonymous element dummy which can be source of security breaches. In 
[9|[TT], authors refine the Fan's model by eliminating dummies, extending the 
class of XPath queries with upward-axes and with a novel notion of security 
views. Different types of policies are also discussed. These works can deal only 
with non-recursive DTDs. They are inapplicable to recursive DTDs because the 
description of paths connecting two element types may be infinite. 

Unlike the XPath query rewriting over non-recursive DTDs, the problem 
posed by the recursion has not received a more attention. Authors of ^ extend 
the principle proposed in |4 with a translation of XPath queries to Regular 
XPath and propose a first algorithm for evaluating Regular XPath over XML 
data. In ^ , a more generalized rewriting approach has been studied by dealing 
with restrictions on the class of queries and DTD types. The defined accessibility 
function is based on the Kleene star. It should be noted that the Kleene star 
cannot be expressed in the standard XPath. 

Although the query formulation and rewriting on Regular XPath is more 
expressive than the standard XPath, we cannot find any practical system for 
both proposed approachef[^ Consequently, the need of a rewriting system of 
XPath queries over recursive XML security views remains an open issue. 

Plan of the paper. The rest of the paper is organized as follows. Section [2] 
presents formally the query rewriting problem for recursive views, and sketches 
our solution to deal with this problem. In Section [3] we give the ingredients of 
our access control specification. Our rewriting approach is detailed in Section 
|4] Section [5] presents how our approach can be extended to consider a large 
fragment of XPath and used to overcome some limitations of existing access 
control approaches. An implementation issue is presented in Section [6] Finally, 
we conclude this paper in Section [7] 

2 Formal Problem Statement 

In this section we present the query rewriting problem for recursive views, and 
sketch our solution to deal with this problem. 

2.1 Preliminaries 

We briefiy review some notions of Document Type Definitions (DTDs) and the 
class of XPath Queries most used in practice. 

DTDs. Without loss of generality, we represent a DTD by a triple {Ele, P, 
root), where Ele is a finite set of element types, root is a distinguished type in 
Ele (called the root type) and P is a function defining element types such that 
for any A in Ele, P(A) is a regular expression a defined with: 

a:= str\t\B\ a\"a \ a"|"a | a* 

where str denotes the text type PCDATA, e is the empty word, B is an element 
type in Ele, and finally a" ,"a, a"\"a, and a* denote concatenation, disjunction. 



^According to [12| the SMOQE system proposed in j6j has been removed because of con- 
duction of future researches. 
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and the Kleene closure respectively. We refer to yl — > P{A) as the production 
of A. For each element type B occurring in P(A), we refer to B as a suhelement 
type (or child type) of A and to ^ as a superelement type (or parent type) of 
B. If an element type A is defined in term of an element type B directly {B is 
subelement type of A) or indirectly, then A is an ancestor type of B and B is 
a descendant type of ^. The DTD is said recursive if some element type A is 
defined in terms of itself directly or indirectly. 

We use the graph representation to depict our DTDs where dashed edges 
represent disjunction. 



XML Documents. We model an XML document with an unranked ordered 
finite node-labeled tree, also called XML Tree. Let E be a finite set of node 

T = [N,R^,R^,L) 



15 



labels, an XML tree T over E is a structure defined as 
where is the set of nodes, R^ C N x N is the parent-child relation, R^ C 
N X N is a, successor relation on (ordered) siblings, and L : — S assigns a 
label to each node. 

An XML document T conforms to a DTD D if the following conditions hold: 
(i) the root of T is the unique node labeled with root; (ii) each node in T is 
labeled either with element type A, called A element, or with str, called text 
node; (iii) for each node n of type A and with k ordered children rii, n^, the 
word L{ni), L{nk) belongs to the regular language defined by P(A); (iv) each 
text node carries a string value (PCDATA) and is the leaf of the tree. We call 
T an instance oi D ii T conforms to D. In the DTD instances depicted in our 
figures, we use X* to distinguish between elements of the same type X. 



XPath Queries. We introduce the downward class of XPath queries referred 
to as X and defined as follows: 



path := axis:: label I path' ['qual'] ' 
I path '/'path I path 'U 'path 

qual : = path I path = c 

I qual and qual I qual or qual 
I not qual I ' ('qual') ' 

cLxis := 4-14-^ 



where label refers to element type in Ele or * (that matches all labels), U stands 
for union, c denotes text constant, axis is the XPath axis relation, and J,, ^+ 
denote child and descendant axis respectively, and finally qual is called an XPath 
qualifier {predicate or filter) which can be a text content comparison, an XPath 
query, or a boolean expression (using boolean operators such as: and, or, not). 

Let n be a node in an XML tree T. The evaluation of an XPath query p 
at node n, called context node n, results in a set of nodes which are reachable 
from n with p, denoted by n\p\. A qualifier q is said valid at node context n, 
denoted by n 1= g, iff one of the following conditions holds: (i) q is given by p=c 
and there is at least one element reachable from n with p which has c as text 
content; (ii) q is an XPath query and n\q\ is nonempty; (iii) g is a boolean 
expression (e.g. not(p)) and it is evaluated to true at n. 
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2.2 XML Access Control Model 

We define below some concepts of security specifications and XML security views 
as initially presented in (4j|9]. 

Security Specifications. Given an XML document T conforms to a DTD D. 
For each class of users some access privileges may be defined to restrict access 
to sensitive information on T. Thus, an access-control specification language is 
defined to specify what elements in T the users are granted, denied, or condi- 
tionally granted access to. An access specification in this language is defined as 
follows: 

Definition 2.1 An access specification S is a pair (D,ann) consisting of a DTD 
D and a partial mapping ann such that, for each production A P^A) and each 
element type B in P(A), ajin(^,i?), if explicitly defined, is an annotation of 
the form: 

ann(^,B) := Y \ N \ [Q] 

where [Q] is a qualifier in our XPath fragment X . A special case is the root of 
D for which we define aini{root)=Y by default. □ 

The specification values Y, N, and [Q] indicate that the B children of A elements 
in an XML document conforms to the DTD D are accessible, inaccessible, or 
conditionally accessible respectively. If aim(A,B) is not explicitly defined, then 
B inherits the accessibility of A. On the other hand, if a.nn{A,B) is explicitly 
defined then B may override the accessibility inherited from A. A text node is 
accessible only if its parent element is accessible. For an element node n of type 
B with parent node of type A, we say that n is concerned by an annotation if 
ajm( A, B)= value exists, moreover, this annotation is valid at n if value=Y, or 
value=\Q\ and n\= Q. 

Security Views. To enforce an access specification, a security view is defined 
to compute for each document T conforms to a DTD D: (i) an instance view 
Ty containing only accessible data; and, (ii) a DTD view £)„ which describes 
schema of all accessible data. Both documents T.^ and Dy are seen only by 
authorized users. 

More formally, let 5'=(-D,aiin) be an access specification. A security view 
for 5* is defined as a pair V={Dy,a) where: (i) is a view of D computed by 
eliminating inaccessible element typetj^from D, according to annotations given 
in S; (ii) cr is a function used to extract accessible data in such a way that for 
each pair of types A and B where B occurs in P(A) in D^, the a{A,B) is an 
XPath query (described in fragment X) defining paths to reach element nodes 
B from an element node A in the original document T conforms to D. It should 
be noted that function cr is hidden from the users. A security view V={Dy,a) 
is said recursive if its is recursive. 

Example 2.1 Consider the DTD D depicted in Figure [ija) where the la- 
bels of the edges represent the following access specification: ajm{root,A)=\q], 

^An element type B is inaccessible if for each parent type A, either (i) the annotation 
artn(/4,_B)=Af exists or (ii) A is an inaccessible element type. 
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y ] \ I 

B C i 



D D 
(a) DTD D (b) view Dv 



Figure 1: Simple non-recursive DTD. 



Bini{A,B)=N, aiai{A,C)=N, and ann(C,£')=y. We define the security view 
V={D^,(t) as follows^ 

: root -)■ A, A -)■ D\e. D ^ e. 

root A: we have a(root,A)=A[q]. 

A D|e: we get a(A,D)=(CU e)/D. 
Figure [ijb) depicts the resulting view £)„ . □ 



2.3 DTD Recursion Problem 



Most existing approaches [4j[9 11 for securing access to XML documents are 
based on the notion of security view. Given a security view V={Dy,a), the 
query rewriting principle is applied to translate each XPath query p over 
to another one pt over the original DTD D, such that for any instance T of D 
(T„ of Dy resp.), pt over T yields the same answer as p over (i.e. pt{T) — 
piTy)f\ Thanks to query rewriting we do not need to materialize view T„ and 
its major problem namely the view maintenance. However, only non-recursive 
DTDs are considered. The security view as specified before cannot be applied 
in the case of recursive DTDs. To illustrate this problem we give the following 
example: 



Example 2.2 For the query l^: -.H over the DTD given in Figure [2f^a), we 
should enumerate all the paths from the root which give an accessible element 
H (as done in (i]): /root/ A[q'] / (B U D/E/G)/H. However, the task is com- 
plicated in the case of recursion. With the same query over the DTD in Figure 
^h), the function a used to extract accessible data, cannot be defined, e.g. 
a{D,E) can be E, F/G/D/E, or F /G / D / F /G / D / E etc. Then a{D,E) leads 
to infinitely many paths and cannot be defined in X . Moreover, the rewriting 
of the query |+ : : iJ is equivalent to the following regular expression: 

/root/A\q]/B/H U /root/ A\q]/qi/{q^Y / H U 

/root/A\q]/{q2r/D/E/G/{D/Gr/H 
where: q^ = D/G \J D/E/G, and 92 = D/G \J D/E/G U D/F/G. □ 



''Note that e denotes the empty path. 

■*We denote by p(T) the result of evaluating query p over document T. 



RR n° 7834 



10 



H. Mahfoud and A. Imine 




Figure 2: DTD Recursion Problems. 



Since the Kleene Star (denoted by *) is not part of the standard XPath and 



cannot be expressed as outlined in 15 , the rewriting of XPath queries is not 



always possible. We refer to this problem as the non-closure of XPath fragment 
under the query rewriting. The closure property is defined as follows: 

Definition 2.2 A class C of XPath queries is closed under query rewriting if 
there is a function Rewrite; C ^ C that, for any security view V ={D^,a) and 
any query Q in C over Dy, computes (3t=Rewrite((5) in C such that for any 
instance T conforms to the original DTD D and its view Ty w.r.t V , we have 
QiTy) = Qt{T). □ 

It has been shown in ["s] that the downward class (i.e. fragment X) of XPath 
queries is not closed under query rewriting. 

Theorem 1 For recursive XML security views, fragment X is not closed under 
query rewriting ^51. □ 

2.4 Our Proposed Solution 

We show that the expressive power of the standard XPath |2] is sufficient to 
overcome the query rewriting problem over recursive views. We propose to re- 
define function Rewrite given in Definition |2.2| into Rewrite: Ci — > C2 where 
C2 is the fragment Ci extended by adding some axes and operators. Using this 
extension, for any access specification 5'=(£',aiin) and any query Q in Ci over 
Dy (view of D computed w.r.t S), we can compute Qt = Rewrite(Q) in C2 such 
that for any instance T of D and its view Ty w.r.t S, we have Q{Ty) — Qt{T). 

The input fragment Ci in our case is fragment X (namely the downward 



class) defined in Section 2.1 which is used only to formulate user queries and 
to define access specifications, while C2 is an extended fragment of X defined 
as follows: 
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axis::label I path' ['qual'] ' I path'['n']' 
I path '/'path I path 'U 'path 
path I path = c I path = e: : label 
I qual and qual I qual or qual 
I not qual I ' ( ' qual ' ) ' 
e I i I t I i+ I t+ I t* 

we enrich X by the self -axis (e), the upward-axes parent (f). ancestor (f*"), 
and ancestor- or- self (f*), the position and the node comparison predicates. 
The position predicate, defined with [n] (n e A^), is used to return the n*^ node 
from an ordered set of nodes. For instance, since we model the XML document 
with an ordered tree, the query \.:: at an element node n returns its first 
child element, while t^:: =' topo'][l] returns its first ancestor element which 
has an child element B with text content 'topo '. The node comparison predicate 
ltargeti=target2'i is true only if the evaluation of the right and left sides result 
in exactly the same single node. For example the predicate in the following 
query |+:: A /B[\::* = is valid for any B element child of an A 

element. 

We summarize the augmented fragment of X by the following subsets X'^ 
(X with self and upward-ax.es) , X^^ {X'^ with position predicate), and the final 

fragment X^ ^^\n] '^i^^ node comparison predicate). 

It should be noted that for a given query Q, a rewriting technique must 
ensure the following conditions: (i) each subquerjj^of Q refers only to accessible 
element nodes; and (ii) each relationship defined between two subqueries of Q 
is respected. For instance, the query 4,+::£'/4,::* over the access specification 
depicted in Figure must returns only accessible element nodes of type G 
or E which have an accessible D element as parent. 

We define the accessibility problem as: " When does an element node of 
a given type is accessihleT\ It is clear that the function a cannot solve this 
problem because of infinitely many possible paths involved by recursive views 
(see Example 2.2). We show in the next that the accessibility of a given element 
node w.r.t a given recursive view cannot be defined in the fragment X (even 
in X'^). We investigate the use of the augmented fragment X^^-^ to solve the 

accessibility problem in particular, and the fragment X'^_^ as a solution to 
avoid the non-closure of XPath fragment X in general. 

2.5 Notations 

Given an access specification S'=(D, ctnn), and a document T conforms to D. 
We define two predicates Al'^'^ and A'^'^'^ as follow^ 

Ar ■■= r::*le::root Va„„(A',A)6a„„ s : : A/t : : A'ni:\ 

ie-.-.Toot V(an„(A',A)=y|[Q])ea„n E : : A.a{A\ A) ^ : : A'^ 

■■= AianniA'A)=meann (t+ : : ^ [not (/?)]/t::A') 

^For example, the query 4,+ ::_B[4,::C] contains two subqueries 4,+ ;:_B and 4,::C with a par- 
ent/child relation defined between C and B, and an ancestor/descendant relation defined 
between B and the node context at which the query is posed. 

^Note that A.a-(A',A) gives A[Q] if aim{A' ,A)=[Q] and A otherwise. 
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Where f\ and \J denote conjunction and disjunction respectively. The pred- 
icate Al'^'^ has the form f* — * iqualil [1] iquahl ■ Applying t* : : * \_quali\ on 
an element node n of T returns an ordered set S of element nodes (n and/or 
some of its ancestor elements) such that for each one an annotation is defined. 
Thus, with S[qual2] {n N Al'^'') we ensure that the first element node in S is 
concerned by a valid annotation. With the second predicate, we use n N ^g'^'^ 
to ensure that all qualifiers defined over ancestor elements of n are valid (we 



discuss this restriction in Section 5.2 1. These predicates are "powerful tools" to 



solve the accessibility problem as we will see in the next section. 



Since the a function is not computable in case of recursion, the parent/child 
relation defined between two element types in the query (e.g. query ],~^::A/],::B 
defines parent/child relation between A and B) cannot be rewritten in X. Ac- 
cordingly, we define the two predicates A^ and A^ to rewrite parent/child 
relation: 

A+ := t+: :*[^r''] 
A^ := A+Lll/e: -.B 

For an element node n in T, n|^+ ] returns the set of all accessible ancestor 
elements of n. The element node n has an accessible B element as parent if and 
only if n\= A^. 

We use these four predicates throughout the paper to formalize our solution. 



3 Access Control with Recursive DTDs 

Our access control framework is presented in Figure [3] For each class of users, 
the administrator defines an access specification S={D, aim) over the DTD D. 
The DTD view is derived first and given to the users to formulate their 
queries. For each instance T of Z), we compute a virtuaQview of T to show 
only accessible data. Each X query Q ov er T,, is efficiently rewritten, using the 
security view V (defined below in Section 
over T, in order to return only accessible 



3.1 1, to an equivalent X^^ query Qt 
data. 



3.1 Recursive Security Views 

We redefine the security view over an access specification 5'=(I?,ann) to be 
V={Dy, ami), where Dy is the view of D, computed by algorithm DeriveView 
illustrated in Figure [4] and used by the users to formulate their queries. 

We use first a DTD parserj^to explore the DTD D into an expressive indexed 
structure in such a way, for each element type A in D, the set of its children types 
and descendant types are returned, also the content model P(A) is represented 
as a tree where all sub-expressions composing P(A) are detected as we explain 
in the next. 

We define the recursive function ExTp( A, access) that, according to a given ac- 
cess specification S={D, ann), extracts the content model for the element type 
A and for all its descendant types in D. For each element type A parsed by 

^The views of T are never materialized. 

^Available at: http://wwM.rpbourret.com/dtdparser/index.htm 
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Figure 3: XML Access Control Framework. 



Exp, we eliminate all inaccessible subelement types Bi of A (if a.mi{ A, Bi)=N 
exists) to compute the new content model Py{A). The value access represents 
the inherited accessibility. For a given content model "A — > Bi, (i?2|-B3)", we 
refer by Gi op a G2 to the sub-expressions Bi and B2\B2 respectively, sepa- 
rated by opA = The list Parsed is used (i) to store the extracted content 
model of each element type in D; and (ii) to avoid more than one parsing of 
the same element type. By invoking Exp (roof,frMe) in algorithm DeriveView, 
the content models of all element types of D are computed. The value of 
PeLrsed{A,true)=(j) indicates that the element type A is not accessible, while 
the value Parsed(^,irMe)=e indicates that the content model of A is an empty 
word. The output of algorithm DeriveView is a DTD view Dy = [Elcy, Py, root) 
where ElCy is computed by eliminating inaccessible element types from D, and 
Py returns the content model of each (accessible) element type in Dy . 

The complexity of our DTD view derivation algorithm, DeriveView, is given 
by the following theorem: 

Theorem 2 Let S={D, aim) be an access specification, and P' be the largest 
production in D, then the view Dy of D can be derived w.r.t S in at most 
0{\D\ * \P'\) time. □ 

Proof. For an element type A in D, we denote by |P(A)| the number of all 
subelement types and operators ("," or "|") defining P(A). The procedure Exp 
of Figure [4] works over the hierarchical, parse-tree representation of the regular 
expression P(A). This tree is given using the DTD parser cited above, where 
its intermediate nodes represent operators and the leaves are the subelement 
types of A. Each operator links two or more element types/sub-expressions. To 
compute the new content model of A, we parse all the element types Bi (i.e. the 
leaves) of the P(A) tree to eliminate each inaccessible Bi (i.e. SLsm.{A,Bi)=N 
exists). Next, each node operator with no children nodes is eliminated. Finally, 
the new resulting tree is translated into a regular expression which represents 
the content model Py{A). Thus, these steps are done by parsing all the nodes 
of the P(A) tree in 0(|P(v4)|) time. If we consider that P' is the largest pro- 
duction in D then 0(|P(A)|) is bounded by 0(|P'|) and the content models of 
all element types of D are computed in at most OdDj * |P'|) time. □ 
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Algorithm: DeriveView 

input : an access specification S={D,ann) with D=(Ele, P,root). 
output: a DTD view Dy. 

1 {Eley,P^.} := {<t>,<i,y, 

2 Exp (roof , t r ite) ; 

3 foreach element type A a D do 
if Parsed(A,true)y^ <f> then 

EUy := Elcy U A; 
Py{A) := Parsed(A,tr-ue); 

7 Dy := {Elev,Pv, root) ; 

8 return Dy- 

Procedure: Exp{A, access) 

Input: an element type A, inherited accessibility access. 
Output: content model of A. 

1 it Parsed (A, access)^ null then 

2 1^ return ParsedCA, access); 

3 exp := <p; 

4 case P (A) is str or e 

5 if access then 

6 |_ exp := str (e resp); 

7 case P (A) is Gi opj\...opA Gn 
II op A is " I " or " , " 
foreach subexpression Gi do 

if Gi = B then // case of single element type 
if ann{A, B) ^ ann then 
if access then 

exp := exp op a B; Exp (. B , true'); 
else if Exp (B , false) (f then 
^ exp := exp op a EiLp{.B , false); 

else if ann{A, B) =Y then 

exp := exp op a B; ExpiB ,true); 
else if ann(A, B) = N and Exp (B , false)^ <j> then 

exp := exp opA ExpCB , false); 
else /* ann{A, B) = [Q] */ 
exp := exp op a (Sle); 
Exp (B, true); 

else if Gi = B* then 

similar to the previous case except that in exp, B is replaced with 
B* {also for Exp(B ,true) and Exp(B , false)) ; 
else // Gi is composition of element typesi subexpressions 
define A' in D as temporary element type; 
define content model P(A') := Gi; 
if Exp (A' .access)^ </> then 
^ exp := exp op a Exp (A', access); 

delete A' from D and the Vaxse&(.A' , access) entry; 

30 if (exp = 4> and access) then exp := e; 

31 Parsed(A, access) :=ei;p; 

32 return exp; 



Figure 4: DTD View Derivation Algorithm. 
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Figure 5: Example of Instance view. 



3.2 Accessibility 

Now we define the element node accessibility based on the use of recursive views: 

Definition 3.1 Given a security view Y ={Dy^ aim.) and a document T con- 
forms to the original DTD D, then an element node n on T of type B with 
parent node of type A is accessible (shown in the view Ty of T), if and only if 
the following conditions hold: 

i) The element node n is concerned by a valid annotation, or ann(A, B) does 
not exist and there is an annotation defined over ancestor element n' of n 
where: n' is the first ancestor element of n concerned by an annotation, 
and this annotation is valid at n' . 

ii) For each ancestor element n' of n concerned by an annotation with value 
\Q'\, n' 1= Q' must be verified. □ 

Example 3.1 Consider the DTD depicted in Figure [2|^b) where annotation q 
is \.::D. For an element node n of type H within the instantiation of this DTD. 
if its parent element is of type C then n is not accessible. Otherwise, the first 
ancestor element of n which is concerned by an annotation can be either of 
type F (i.e. ajni{D,F)=N) , of type E (i.e. aini{D ,E)=Y) , or of type A (i.e. 
aini{root,A)=\q]) . This means that n may be accessible if its first ancestor 
element is of type E ov A, and it has no ancestor element n' of type A with 

Note that the element node accessibility over recursive XML views cannot 
be defined in X'^ . We consider the access specification 5'=(I?, arm) composed 
by the DTD of Figure [2|^b) and the annotations depicted in the edges. Figure 
[5] represents (a) an instance T of D and (b) its view Ty computed according 
to S. The query H over T must be rewritten to return only the node H^, 
which is accessible w.r.t S. However 4,+ :: H[f^:: E or t+:: A[q^ returns both the 
nodes and if^, and \ + :: H[{-\+:: E or t+:: A[q]) and not (t+:: F)] rejects 
the accessible node shown in Ty. □ 



We use the predicates A\^'^ and A'^'^'^ defined in Section 
accessibility conditions (i) and (ii) respectively of Definition 



2.5 



3.1 



to satisfy the 
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Definition 3.2 For any security view V=(I?i,,ann) and any instance T con- 
forms to the original DTD D, we define the accessibility predicate A"''^'^ which 
refers to an Xy_^-^ qualifier such that, an element node n on is accessible iff 
n N A""", with A""" := Af"" A A^"". □ 

For an element type _B in a DTD D, |+::i3[^°'^^J stands for all accessible B 
elements in an instance of D. 



Example 3.2 Consider the access specification depicted in Figure |2|^b), the 
predicates Al'^'^' and Af^'^ are defined as follows: 

jyicc 1* ■.■.*\_e::root V e: -.A/^: ■.root V e: -.E/^: -.D V e: -.F/^: :D 
V e::i//t::C] [l][e::root V e : : A [q] /t : : root V e: ■.E/^;■. ■.D'\ 

A'i"^ := not (t+::A[not {q)'\ /I:: root) 

Consider the element node of the XML document illustrated in Figure [sj a). 
Then, H^lT:-*[e::A/f.:root V e::E/f.:D V e::F /f.-.D V e::H/f.:C V e::root\l 
returns the set S={F'^ ,E^ ,A^ ,root} of ordered element nodes (element node 
and/or some of its ancestor elements) where for each one an annotation exists 
(e.g. aim{D,F)=N for element node F^). Note that S[l] returns the ancestor 
element F^ and the final predicate Ai'^'' over the element node of Figure 
[Sfa) is not satisfied (i.e. since the first ancestor element concerned 

by an annotation in S is not accessible {F^'s annotation is not valid). The query 
4,"'"::i?[.A'''^'^] over the instance T of Figure J5[a) returns only the accessible ele- 
ment (shown in the view Ty of Figure ^b)). □ 

Property 1. For any security view F=(-Dt,, arm), the accessibility predicate 
vA""^^ can be constructed in 0(|ann|) time. □ 

Proof. For any security view F=(£'^,ann), the construction of Af^'^ and 
^2*^^ depends only on the parsing of all annotations ann of V which is done in 
0(|ann|) time. □ 

4 Query Rewriting over Recursive XML Views 

In this section we describe our XPath-based query rewriting algorithm. Given 
an access specification S'=(£', stnn), the security view V={Dy^Bim) of S, an in- 
stance T conforms to D, and its virtual view computed w.r.t V. Then, for 
any query p over T^, the goal of query rewriting is to find a rewriting function 
that we define as: 

y V vff 

i\ r iXr 1 

[":=] 

p — > Rewrite(p) such that piT^) = Rewrite (p) ( T) 

Our rewriting function Rewrite ensures that only accessible element nodes are 
referred to by the subqueries of p, which is ensured by the accessibility pred- 
icate of Definition |3.2| Moreover, the relationships defined between each two 
subqueries of p must be respected. 
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Figure 6: Query Rewriting Problems. 



Notice that, for a given security view V={Dy,sjm), we compute first the 
predicates ^"^^ and w.r.t V in 0(|ann|) time (Properties 1 and 2 resp.). 
Also, for each element type A in Z)„ we compute the lists of its children types 
and descendant types denoted by Reach(4,,y4) and Reach(4,+ ,^) respectively. 
Each list is computed in 0(|_Di,|) time. The lists of all element types of Dy 
are computed in 0(|Z3^,p) time. This preprocessing step is done only one time 
after the security view V is defined and it provides performance gains during 
the query rewriting step. 

In this section, we consider the DTD view £)„ shown in Figure |6jb) which 
represents the derivation of the DTD D of Figure |6j a) with respect to the access 
specification depicted in the edges. Figure [6jc) represents a valid instance T of 
D and its derived view is depicted in Figure |6jd) . 



4.1 Queries Without Predicates 

For a DTD D={Ele,P,root), we discuss the rewriting of queries without predi- 
cates with the form axisi: : Ei/ . . ./axiSn- - En where Ei G {Ele,*} and axisi 
is an XPath axis in fragment X. Given the query 4,+ : :£'i/4,+ : -.Ej, it is clear 
that the rewritten query can be 4,+ : : Ei \.A°''^'^1 /i+ : : Ej [^"'^'^] to return acces- 
sible Ej elements which have at least one accessible ancestor element of type 
Ei. However, it is not so simple in the case of child-'Axis. 

Example 4.1 The query : : A/],: : E over the view of Figure |6|d) returns 
E elements having an A element as parent. So the elements E^ and E'^ are re- 
turned. Using the kleene closure, this query can be rewritten over the instance 
T of Figure ^c) into: 1+ : : A/i : : B/i : : 5 / (J. : : B/i : : D)*/i: : E. However, 
using the standard XPath, a cycle in the DTD cannot be replaced by 'i^ ' . For 
instance the query |+ : : ALA^'^'^I : : E [A'^'^'^l returns the elements E^, E^, 
and E^, while E^ does not have a parent A. □ 



2.5 



We use the predicate {B can be any element type) defined in Section 
to rewrite the parent/child relation. For instance, the query : :A/l: -.E of 
the previous example can be rewritten into J,"*": : E [A°''^'^'\ LA^I to return only 
accessible E elements (verified with which have an accessible parent of 

type A (verified with A^). 
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Example 4.2 Given the access specification of Figure |6ja) , we define the pred- 
icate as follows: 

A+ := t^::*Lt*--*Le::E/t::D V e: : B/t: : D V e::5/t::C V 
e: :B/t: : A V e: -.roof] [1] [e: :£'/t: : f V £::D/^: :C V e: :root]]. 
At element node of Figure ^^A) , yl+ returns the set of its ordered accessible 
ancestor elements {D^ ,root}. that we denote i?^!^"*"]. The predicate 
(i.e. ^"'"[l]/e::A) does not hold at element node (i.e. E'^ ¥- A^) since the 
first accessible ancestor element of E^ is not of type A (i.e. -4^[1] at E^ returns 
its ancestor element However. A^ holds at element nodes E^ and E'^ . □ 

Property 2. For any security view V={D^^aini) and any element type B in 
Di,, the predicates A'^ and A^ can be constructed in 0(|ann|) time. □ 

Proof. The same principle as the proof of Property 1. □ 

Finally, given a (recursive) security view ¥={0^,^ etnn), we define the rewrit- 
ing function Rewrite : X x Ele — > X|>i] that we use to rewrite an X query 
p=Pi/ . . . Ipn (where each subquery pi is given with axiSiV.Ei) over a node con- 
text of type E in D^,, to an equivalent one Rewrite(p,i?) defined in ^Yj^j over 
the original DTD as: 

RewriteCp.i;) := ;+::£;„ [pre/ix-i (pi/ ... /p„)] 

Where the qualifier prefix~^ (pi/ . . ./pn) is recursively defined over the de- 
scending list of subqueries of p. For each subquery pi, prefix~^ ipi/ . . ./pi-i) 
is already computed and used to compute prefix~^ (pi/ . . .Ipi) as follows: 

• axiSi = i: prefix~^{pi/.../pi) := A^'-^[prefix~'^{pi/.../p,_i)] 

• axis, = i+: prefix-^{pi/.../pi) := t^::E,^i\A'""']\prefix-^{pi/.../p,_i)] 

Recall that E is the type of context node at which the query is evaluated. As a 
special case we have: prefix^^ {l::Ei)=A^ , a,ndprefix^^{l'^::Ei)=1;^::E\A'^'^'^]. 

Example 4.3 Consider the query I: : A/ 1: : E over the node context root of the 
view Ty of Figure ^d). Using our algorithm Rewrite we obtain: 

Rewrite (;:: A/;: root) = ;+ : : S W™"*] ] = 

i+::E IA'"='=1 IA+ [1] /e : : A [1] /e : : rooi] ] 

A'^ is given in Example 4.2. The evaluation of A'^ returns {A^,root} at element 
node E^, {A^'^ ,A^ ,root} at E'^, and {D^,A^,root} at E^ . With A^ we ensure 
that for an element node E referred to by the query, its first accessible ances- 
tor element is of type A which is verified for E^, E"^ and not for E^ in T„ of 
Figure [of^d) . Moreover, with A^°°* we ensure that the A element returned by 
A^ , which can be A^ or A^^ for element nodes E^ and E'^ respectively, must 
have root as the first accessible ancestor which is verified only for A^ . Thus the 
query RewriteC^: -A/}^: :E, root) at root of Figure^d) returns {E^}. □ 
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Algorithm: Rewrite 

input : A query p, and an element type A for which query rewriting is carried, 
output; a rewritten query pt w.r.t A. 

1 if p = piU. . .Upn then 

2 1^ return U<i<„ Rewrite (p^ , ) ; 

3 reach := {j4}; 

4 compute the descending list L of subqueries of p; 

5 each subquery Pi = axisi::Ei[fi\\ 
8 Pt := e; filters := e; 

// compute prefix~^(p) to be pt 
7 foreach pi in the order of L do 
// axes rewriting 
case axisi =\. 

if (filters = then 

I Pt := [pt]; 
else 

|_ Pt := ^■f'('''='"='''='[/iiter5][pt]; 

case axiSi =4,"*" 

if (filters - e) then 

I Pt := /s(reac/i,t+)[^'''=1[pt]; 
else 

|_ Pt := /s(reac?t,t+)[^"'=1[/i«ters][pt]; 

// *-label elimination 
if (E^ = *) then 

I reach := UBgreac;iRs^ch(axiSi, _E); 
else if f3 _E £ reach s.t Ei £ Seach(axisi, E)) then 

I reach := {Ei}; 
else 
1^ reach := {}; 

if ('reach=f7j then return (/>; 

// rewriting of predicate fi over reach elements 
filters := RW_Pred(/i .reach) ; 
if (filters = false) then // invalid predicate 
I return 4>; 

else if (filters = true) then // omitte fi from pi 
1^ filters := e; 

II rewritten query pt of p w.r.t A 

30 if (filters = t) then 

31 I Pt := fs(reach,i+)\A'""']\pt\\ 

32 else 

33 |_ Pt := fs{reach,i+){A'^''%filters\\pt\; 

34 return pt\ 



Figure 7: Algorithm for XPath Queries Rewriting. 



The detail of Function Rewrite is given in Figure [7j After computing the 
descending list L of p's subqueries, we parse them to generate the prefix~^ of 
the rewritten query as explained above. The node context A can be initialized 
to root of the DTD for rewriting p over the entire document. If pi is axiSi'.:*, 
then the *-label is replaced by the set of children/descendant types of Ei^i 
{axisi is ^ or ^+ resp.). Then, the rewriting oipi over pi-i can result in a set of 
element types denoted by reach. Moreover, if Ei then it must exist at least 
one element type E in reach (result of the rewriting of Pi-i) where Ei is a child 
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type of E if axisi =4- or 01^6 of its descendant types if axisi —l.'^ (lines 18-23). 
If the rewriting of pi over pi^i stands for an empty set reach then the query is 
rejected (Hne 24). By fs we refer to the function fusion, e.g. /s({£'i, £"„}, t^) 
= (t+::SiU..U t+::^n). 

Function RW_Pred. called in the algorithm Rewrite, represents the predi- 
cates rewriting, which is the subject of the next section. As shown above, up- 
ward-axes and the position predicate are necessary for rewriting simple queries 
(without predicates). We prove below that fragment is not closed under 
query rewriting. Extending this fragment with the node comparaison operator 
(which results in the final fragment ^Yj^^ ) turns out sufficient to rewrite any 
query in X . 

Theorem 3 For recursive XML security views, the XPath fragment X^^^ is not 
closed under query rewriting. □ 

Proof, (by contradiction) We consider query with the form Elq'] which 
represents the rewriting limitation of the XPath fragment <^[fj]- Assume that 
the query rewriting can be done in X^^^ . The query over the view in- 

stance Ty depicted in Figure |6]^d) cannot be correctly rewritten, using the previ- 
ous definition of algorithm Rewrite, into Rewrite(4,+ ::yl,root)[Rewrite(4,::i?,^)] 
equivalent to 4,+ ::^[yl°'^'^][4-^::£^[-4'^'^''][-4"^]]. Indeed, the resulting query returns 
{A^, A?. A^^}, but A? does not have an immediate child E. The limitation 
is due to the fact that predicate \\;.:E\ must return all descendant elements E 
having as the first accessible ancestor the node context A at which the predicate 
is evaluated (i.e. the element node returned by 4::i?/y^+[l] must be the same 
element node of type A at which the predicate is evaluated). This cannot be 
expressed in ^^jj^j and can be done only by introducing the node set comparison 

(e.g. Xy^ ^j) as we will present in the following. □ 
4.2 Predicates Rewriting 

We explain the rewriting of predicates to complete the definition of our rewrit- 
ing algorithm Rewrite. For a given query axis\.:E\\q-\\l ...j axiSn-'-Er\qi^, we 
rewrite each predicate qi over the element type Ei at which qi is defined. Given 
a security view Y =[Dy,anri) and a subquery E\.q\ (we take a simple predi- 
cate q=q\l ...jqn where qi=axisi: : Ei for more comprehension). We define the 
function RW_Pred : X x Ele X^^ to rewrite the predicate q in X over 

element type E in D^, to an equivalent one RW_Pred((7, E) in X^^ ^j, recursively 
defined over the descending list of sub-predicates of q as follows: 

• axisi =1: RW_Pred((7i/ ... /(7„ , : = 

i+: [RW_Pred(9,+i/. . ./q„,i;i)]M+[l] = s::Ei^i 

• axisi=l'^: RW_Pred(gi/. . ./(7„,ii^i_i) : = 
i+: [RW_Pred(g,+i/. . ./q™,^;^)] 

Given a query axiSi::Ei[axiSj::Ej="d^\ (text-content comparison), we have the 
following rewriting: 
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m!_¥reA{i::Ej=''c\E,) := ;+ : : [^1'^'='=] [e : : *="c"] M+ [1] =e::E, 

Example 4.4 The query |+ : '.El, given in the proof of Theorem [s] is 

rewritten into: 

Rewrite(4,+ ::A[;::S], root) = i+::A[A'"'%t^::root[A''''''\\[m_Pred{i::E, A)] 

= i+::A[A'""'\[i+::E[A'""'\/A+[l\=e::A\ 
where t+::roof[.A'''^'^] is omitted since root is always accessible. The rewritten 
query over of Figure [6|d) , returns the element nodes A^ and A^^. □ 

The detail of Function RW_Pred is given in Figure|8] We have seen in Rewrite 
algorithm that the rewriting of subquery axisi^i : : * over Ei results in a set of 
element types (reach) reachable from Ei. Then, the predicate q in axisi : : Ei [q] 
is rewritten over element type Ei (Ei j^*) as explained in the above definition 
of RW_Pred. While the predicate q in axiSi'. : Ei/axisi^i'. :*[(/] is rewritten 
over the set of element types resulting by the rewriting of axisi+i : : * over Ei. 
We denote this set by L. For a given predicate qi/ . . ./q^ over element type 
E, L :=reach(q'i , {i?}) denotes the result of rewriting sub-predicate qi over E 
(element types reachable from E with qi), sub-predicate q2 is rewritten over L 
resulting in a new set L :=reach((72 > L) (i.e. L :=reach((72 , reach (gi , {E}))), 
and so on until rewriting qn over L:=reach((7„_i , L) . Each sub-predicate qi 
can contain other sub-predicates (case of axiSi::Ei[fi]). The *-labels in the 
sub-predicates are eliminated with the same principle explained in algorithm 
Rewrite using the precomputed lists of children and descendant types (Reach). 
The rewriting result of predicate q can be false if some element types in q are 
inaccessible (they do not appear in Dy) or some relationships are not respected 
(e.g. the rewriting of predicate i::Ei over element type is false if Ei is not 
a subelement type of Ei-i, such that Ei ^ Reach(4,. £";_!)). Rewriting result 
can be true (the predicate is omitted from the query) in case of not( q) with non 
valid q. 

Example 4.5 Consider the query 4,+::A[4,::*/4,::Z)] over the Figure|6jb). Using 
our algorithm RW_Pred, the predicate is rewritten over element type 

A as follows: 

[RW_Pred(|::*/;::£),^)] = 

[{i+::AU i+::DU i+::E)[A'"'%RV_Pred{i::D,{A, D, E})\/A+[l\=e::A\ = 
[{i+::AU i+::DU i+::E)\A'''%+::D\A'''''']/A+\l]=e::*]/A+\l]=e::A]. □ 

Now, we generalize the formal definition of the algorithm Rewrite given in 
the previous section to handle predicates. Given the query pi/ . . ./pn where 
Pi = axisi : : Ei [/J and /; ([/;] is optional) is a predicate defined over element 
type Ei. We rewrite this query over node context of type E as follows: 

Rewrite(p,£;) := i+ : : E^LA^mf;,! Lprefix-Hpi/ . . . /pn)! 
where an intermediate step, prefix^^Cpi/. . ./pi) is recursively defined with: 

• axiSi = i: prefix-^{pi/.../p^) := A^'-^[fi_i][prefix~^{pi/.../p,_i)] 

• axiSi = |+: prefix^^{pi/.../pi) := 
\+::E,^M'-%fU]\prefix-\p^/.../p,^{)] 
where := RW_Pred(/,_i, . 
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Algorithm: RW_Pred 

input ; a predicate q and a list L of element types, 
output; a rewritten predicate qt w.r.t element types in L. 

1 qt := false, f := true; 

/* content -test is optional, if it does not exists then [e::E=c] below is 
omitted */ 

2 if (q is a single predicate axis : :E [f]= c) then// [/] is optional 
if (E = *) then 

I reach(g,L) := U^'ei'^sachCaxisj, ; 
else it (B E' e L s.t E e Reach(axisi, E' )) then 

I reach(9,L) := {E}; 
else 

1^ reach((ir,D := {}; 

if (Tea.ch(q,L) = {}) then return false; 
// rewriting of /' if [/] exists 
/' := RV_Predif ,E); 
if ({f\ exists and f = false) then 
|_ return false; 

II [/'] below is omitted if f'=true 
if (axis =].) then 

I qt := fs(i+,reach{q,L))lA'""=llf':\le::E=c-]IA+lll=e::*; 
else 

|_ qt := fs{X+,reach{q,L))lA''''nif\le::E=c-]; 

17 else if (q is qf/qr where qf = axisi: :E [f] and qr is the remaining steps) 
then 

rewrite qf as done in lines 3-16; 
q'^ := R¥_Pred (qr ,Teach.(qf ,L)); 
if (q'r= false) then return false; 
if (axisi =\-) then 

I qt := /sa+,reac/i(?/,L))U«==] [/'] [g;]/^+[l]=£::*; 
else 

L qt := /sa+,reoc/i(?/,i))U"==] [/'][?;]; 

25 else if (q is qiA...Aq„) then 

if (9 qi s.t Rlf.Pred(qi,L)= false) then 

27 \ qt ■= false; 

28 else if (RI/_Pred(qi ,L)= true for each qi) then 
I qt ■■= true; 

else 

L It ■■= Aw.Pr-ecife,L)^tr-ue-«''-^'^«''(9<-'f'); 

32 else if (q is qiV . . .Vq„ or qiU...Uq„^ then 
if (3 qi s.t RW.Pred(qi,L)=true) then 

I qt •■= true; 
else 

L 9* == VflW.Preci(«,i)^/a(«-«*'-^''«'*(9i.i); 

37 else if case of not (q) then 

if RW.Pred(q,L)= false then 

I qt •■= true; 
else if RW_Pred(q,L)^ true then 
1^ qt := not (RV .Pred (q , ; 

42 else if case of e then 

43 \_ qt true; 

44 return qt; 



Figure 8: Predicate Rewriting. 
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4.3 Complexity Analysis 

Given a security specification 5'=(£',ann), we extract first the security view 
V={Dy, aim) corresponding to S, where Dy is derived using our algorithm 
DeriveView of Figure |4] The user is allowed to request its authorized data 
represented with the DTD view Dy. For each query Q in X, our algorithm 
Rewrite translates this query to an equivalent one Qt in such that, for 

any instance T conforms to D, its virtual view Ty conforms to Dy, we have 
Q(T„)=Rewrite(Q)(T). 

The overall complexity time of our rewriting algorithm Rewrite is stated as 
follows: 

Theorem 4 For any security view Y=(Dy,aim) and any X query Q over the 
DTD view Dy, the algorithm Rewrite computes an equivalent query Qt in X^^ 
over the original DTD in at most 0{\Q\ * time. □ 

Proof. Given an X query p=axisi::Ei[qi\/ .../axiSn'.'.En[qn\, we denote by 
\p\ the number of subqueries and sub-predicates of p, e.g. |4,::i?i[not(4,+ ::*)] |=2. 
Each subquery (or sub-predicate) pi=axisi::Ei[qi] of p must be rewritten over 
Pi-i=axiSi-i::Ei_i[qi_i] to check the accessibility of Ei and to preserve the 
relationship defined between Ei and Ei-i. This is done in a constant time 
by using the precomputed predicates A"''''^ and ^+ as in lines 8-17 of algorithm 
Rewrite and lines 13-16 of algorithm RW_Pred. The *-label of each subquery (or 
sub-predicate) is eliminated as done in the lines 18-23 of algorithm Rewrite, and 
lines 3-8 of algorithm RW_Pred, which causes an additional cost For in- 

stance, to rewrite the query J,::* over the set reach, the elimination of the *-label 
amounts to parse each element type E in reach and compute the union of its 
children types given by the precomputed list Reach(4,,i?) (|Reach(4,,i?)|=0|Dt,|). 
Next, the *-label of the query J,::* is replaced by the union of children types of 
element types in reach which is done in at most 0|D„p time. Thus, a given 
query p can be rewritten in at most 0{\p\ * |-D^,p) time. □ 

4.4 Query Rewriting Improvements 

We discuss in this section some possible implementations of our rewriting algo- 
rithm Rewrite to improve the overall complexity of Theorem^ 

The first optimization can be done by avoiding the *-label elimination step 
discussed above. For a security view V={Dy, ann), an X query Q over the DTD 
view Dy can be rewritten in a linear time 0(|Q|). Using the precomputed predi- 
cates A"''^'^ and A'^, each subquery pi of the query p can be rewritten over Pi-i in 
a constant time by adding the predicate A^'^'^ (i.e. yl+[l]/£::£'i_i) or A"''^'^, case 
of axisi=], and axisi=],'^ respectively. For instance, the query ],::*/ ].::B over 
context node of type A can be rewritten into l+::B [A"''"'] [ 1] /e: : * [1] /e: : A] . 
In the same way, each sub-predicate qi=axiSi::Ei can be rewritten over qi-i 
in a constant time into J,+ ::i?i[^°'^'^]/^+[l]=e::_Ej_i in case of axisi=\^, or into 
otherwise. For instance, the predicate [J,::*/|::C] over element type 
B can be rewritten into: [i+::*[y^"='=][;+::C[y^'''==]/yt+[l]=e::*]/y^+[l]=e::B]. 
Thus, the rewriting of the query p depends on the parsing of all its subqueries 
and sub-predicates which is done in 0(|p|) time. 
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Since the query answering time concerns the rewriting time and the eval- 
uation time of the rewritten query, then the existence of the *-label in the 
rewritten query can induce for poor performance when some inaccessible ele- 
ments are parsed by the rewritten query. For instance, given the query ^::E 
over context node n of type A with E=*. Without eliminating the *-label, 
this query is rewritten into and then for each descendant node 

of n of any element type, the predicates A'^'^'^ and are evaluated. To im- 
prove the query evaluation time, the elimination of the *-label in the query is 
indispensable and ensures that two defined predicates are evaluated only at ac- 
cessible descendant nodes of n (i.e. whose types appear va Dy). This elimination 
can give good performance since the size of Dy (number of accessible element 
types) is too small than the size of the original DTD D in practice. For this 
reason, the precomputed lists Reach can be sorted in such a way the union of 
the children/descendant types of two element types of Dy (e.g. Reach(4,i?i) U 
Reach(4,,-E2)) can be linear on the size of Dy. Accordingly, the *-label elimina- 
tion phase, done in lines 18-23 of algorithm Rewrite and lines 3-8 of algorithm 
RW_Pred, can be efficiently improved to take at most \Dy\ time and any query 
p can be rewritten in this case in at most 0{\p\ * |-Dt,|) time. 

5 Extensions 

We discuss some extensions of our proposed rewriting approach to deal with a 
large fragment of XPath queries, and to overcome some limitations of existing 
access specifications languages. 

5.1 Upward-axes Rewriting 

For the rewriting of upward-axes (t and f"*"), we extend the algorithms Rewrite 
and RW_Pred without increasing the complexity of the global rewriting (as ex- 
plained in Theorem E|. 

In Rewrite, prefixipi/. . ./pi) is defined over upward follows: 

• axisi =t : prefix^^ipi/. . ./pi) : = 

i+i-.E^^ilA"^^! Iprefix-Hpi/ . . ./p^-l)l/A+^^l=e■. : E, 

• axisi : prefix^^ipi/. . ./pi) : = 
i+: Iprefix'Hpi/ . . 
where// : = RW_Pred(/i, Sj) . 

In RW_Pred, an intermediate predicate qi/ . . /qn {qi=axisi : : Ei [/J ) is rewritten 
over element type Ei^i in case of upward follows: 

• axisi RW_Pred((7i/ . . . /g„ , Sj^i) : = 
A^^ [/,'] [RW_Pred(9,+i/. ../q„,E,)l 

• axisj =t+: RW_Pred(gi/. . ./g„,i5i_i) : = 
t+'.-.E^lA^n [//] [RW_Pred((?,+i/. . ./<z„,i;,)] 
where// := RW_Pred(/i, . 
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5.2 Revision of Access Specifications 

The node accessibility w.r.t the specification value [/?] has been defined with 
two different meanings (4|[9]. Like in |4|, we have assumed in the definition of 
the element node accessibility that for each element node n concerned by an 
annotation of value [Q\, if Q is not valid at n (i.e. n Q) then n and all its 
descendant elements are not accessible, even if there is some valid annotations 
defined over these descendants (condition (ii) of Definition 3.1 1. However, in ^ 



the element node n can be not accessible (i.e. ni^ Q), but one of its descendant 
element B' can be accessible if it is concerned by a valid annotation. 

We assume that both meanings are useful and an access control specification 
language must provide the definition of each of them. For this reason, we redefine 
the access specification of Definition |2.1| as follows: 



Definition 5.1 An access specification S is a pair (Z),ann) consisting of a DTD 
D and a partial mapping aim such that, for each production A — P{A) and each 
element type B in P(A), ann(A,B), if explicitly defined, is an annotation of the 
form: 

ann(A,B) : = F I iV I [Q] I A^^ I [Q],, 

□ 

Given an element node n of type B with parent node of type A, then with the 
specification values N and [ Q\ , accessibility overwriting is allowed under n even 
though aLD3i{A,B)=N or aiin(^,5)=[(5] and ni^ Q. The semantics of the new 
specification values and iQ'ih are given as follows. If the element node n is 
concerned by an annotation with value N^, then no overwriting of this value is 
permitted to descendant elements of n, i.e. if n' is a descendant element of n, 
then n' is not accessible even if it is concerned by a valid annotation. While if 
n is concerned by an annotation with value [Q\h, then the annotations defined 
under n (i.e. under B element type) take effect only if n\= Q. For instance, if a 
descendant element n' of n is concerned by an annotation of value [Q'] , then n' 
is accessible only if n' N Q' and n N Q. We call the annotation with value Nh 
or [Q]/i, downward-closed annotation. 

Example 5.1 We consider the hospital DTD of Figure |9|a) and we give the 
following access specification: 

• A patient can access only to its own diagnosis information: 

aim(department, patient) = \_pname=%name'\ h 
aim(patient, parent) = aoan^patient, sibling)=Nh 
aim(patient, visit) = N, aim{medication, diagnosis) = Y 

• A research institute can access to patients whose have "diseasel ": 
aim{department, patient) = aini(j)arent, patient) = ann^sibling , patient) 
= \_visit/ treatment /medication \_diagnosis= ' diseasel ' ] ] 

For the first policy, %name is a variable system denoting patient's name. For 
a given patient with name " _Bo&" , if p pname—^ BoV^ then all the frag- 
ment rooted at p is hidden from the patient "Bob" and any annotation un- 
der element node p can be applied since it is the medical data of another pa- 
tient. Also, the annotation aini{patient,parent)=Nh cannot be defined with 
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Figure 9: Hospital DTD. 



axm(patient,parent)=N otherwise aoDJD.{medication, diagnosis)=Y makes diag- 
nosis information of parent accessible to patient "Bob" as its own information. 
The same principle is applied with the annotation a.mi{patient, sibling)=Nh. 
For the second policy, it is clear that a false evaluation of the predicate does not 
imply the inaccessibility of the sub patients, i.e. a given patient may not have 
" disease 1 ", while its parent/sibling can be affected by this disease and must be 
shown to the research institute. □ 

The new access specification defined can be taken into account simply by 
applying the following changes in our rewriting approach. The predicate Al'^'' 
is redefined with0 

j^acc _ ^*::^l£::root V ann(A' ,A)eann E : : A/t : : A'l Lll 
[e::rOOt y(ann{A',A)=Y\[Q]\[Q]h)Gann E : : A.a{A' , A) : : A'^ 

While The predicate A2'^'^' is redefined with: 

A2'"'-= N(ann{A' ,A)=Nh)eann : : A/^ : : A' ) 

NianniA'^A) = [QU)<,ann ^Ot (t+ : : A [not ( /? ) ] /t : : A' ) 



6 Experimental Results 

We have developed a prototype to improve effectiveness of our rewriting ap- 
proach. The performance study is done using a real-life recursive DTD and a 
various forms of XPath queries. The experimental results show the efficiency 
of our XPath query rewriting approach w.r.t the answering approach based on 
the materialization of the view. Notice that we cannot do comparison between 
our approach and the two existing approaches which deal with queries rewriting 
under recursive views (5j[8], since they are based on the non-standard language 
"regular XPath", and no practical tool is present to evaluate regular XPath 
queries. The experiments were conducted using Ubuntu system, with a dual 
Core 2.53 GHz and 1 GB of memory. 

XML Documents. Using ToXGene generator |1 , we generated set of XML 
documents that conform to the hospital DTD of Figure |9] and with sizes ranging 
from 10MB to 100MB. 

^Note that A.<j{A' ,A) gives A[Q] if ann(A' , A) = [Q] I [Q] ,i and A otherwise. 
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Security Specification. Figure |9jb) represents the hospital DTD view Dy of a 
research institute studying inherited patterns of some diseases. This view shows 
only patients having one or more disease from {diseasel, disease2, diseased} 
with their parent hierarchy, and denies access to their name, address, test and 
doctor data. Formally, we define this view with the following annotations: 

1. Bim{hospital)=Y 

2. aimihospital ,name)=N 

3. aimihospital , department) =N 

4. aimidepartment , patient) = 

[^: '.visit/],: : treatment/],: : medication I],: : diagnosis=' diseSiSel' or 
],: : diagnosis=' diseaiSe2' or ^: : (img7ioszs= 'diseases ']] ^ 

5. aim (patient ,pname)=N 

6. aimipatient , address) =N 

7. aimipatient, sibling) =Nfi 

8. aimivisit ,date)=N 

9. aimivisit , treatment) =N 

10. aimimedication, diagnosis) =Y 

11. aim (test , type) =Y 

The annotation 7 must be downward-closed, otherwise the annotations 10 and 
11 can overwrite some sibling data (diagnosis and type of visit) to be accessible. 

XPath Queries. We define the following set of XPath queries : 

1. Qi . ],: :patientl],^ : :visitl],: :diagnosis='d±seasel' or 
],: : diagnosis=' d±se&se2' or ],: :diagnosis='d±sease3''i^ . 

2. Q2 • : :patientl],: :visitll: :diagnosis='diseasel' or 

].: : diagnosis=' d±se&se2' or ],: :diagnosis='disease3''i and 
not (.],~^ : :patient/],: :visitl],: :diagnosis='diseasel' or 
],: :diagnosis='d±sease2' or J,: :(iiag7ioszs='disease3'] )] . 

3. Q3. l'^ : :diagnosisLt- -visit/^: :*/^: :*/^: :*/^: :hospital'\ 

The first query returns patients whose some of its ancestors also had the sames 
diseases. The Second query Q2 returns the first generation where the discussed 
diseases appeared for the first time, and represents the diagnosis of the 
second generation of infected patients. Each query Qi is rewritten over the root 
node (hospital) of each document into Revr±te(Qi,hospital) , and this by using 
the security view V=(Dy,aim) defined with the DTD view of Figure |9jb) and 
the annotations defined above. 
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Figure 10: XPath Queries Evaluation Time. 



Approaches. A comparison is done between our rewriting approach and the 
materialization approach. Given a security view V={Dy, aim), the materializa- 
tion consists in incorporating an accessibihty label (+/-) to each element node 
in the document which is concerned by an annotation of V. Each element node 
n, not yet labeled (i.e. no invalid downward-closed annotation is defined over 
its ancestor elements), is labeled with "+" if it is concerned by an annotation 
with value Y or with value and n \= Q (resp. n is labeled with "-" 

in case of annotation with value N\Nh or with value and nJ^ Q). In 

case of an element node n concerned by an invalid downward-closed annotation 
(with value Nf^ or with value [Q]h and n Q), the n and all its descendant 
elements are labeled with After applying all the annotations of the security 
view V over the document, each unlabeled element node is annotated by in- 
heritance from its nearest labeled ancestor element. The obtained document is 
called fully annotated document. Finally, the materialized view of the original 
document is computed by deleting all inaccessible element nodes (labeled with 
"-") from the fully annotated document and user queries are evaluated directly 
over this view. Thus, we compare the answering time of the materialization 
approach (defined as the view materialization time and query evaluation time 
over the materialized view) with that of our rewriting approach (defined as the 
rewriting time of the query and the evaluation time of the rewritten query over 
the original document) . 

Performance Results. The experimental results are shown in Figure [TO] where 
the answering time of each query is evaluated using our rewriting algorithm and 
the materialization approach. The size of the answer ranges from few hundred 
to a few thousand of nodes. Figure [lO] shows clearly that our algorithm remains 
more efficient than the materialization approach. 

We observe first that the translation of XPath queries from X to does 
not induce for a poor performance and the average of the answering time of 
our rewriting approach remains in general less than 8 seconds for a large XML 
document. Second, a query containing time-consuming elements like *-labels 
or pcirent axes does not degrade the rewriting performance as shown with the 
query Q3. 
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7 Conclusions and Future Work 

The proposed approach yields the first practical solution to rewrite XPath 
queries over recursive XML views using only the expressive power of the stan- 
dard XPath. The extension of the downward class of XPath queries with some 
axes and operators has been investigated in order to make queries rewriting 
possible under recursion. 

The conducted experimentation shows the efficiency of our approach by com- 
parison with the materialization approach. Most importantly, the translation 
of queries from X to does not impact the performance of the queries 

answering. We have discussed how our approach can be extended to deal with 
the upward-axes without additional cost. Lastly, a revision of the access speci- 
fication language is presented to go beyond some limitations in the definition of 
some access privileges. 

As future work, we plan first to provide an optimized version of our approach 
and also to use the same principle to secure XML updating. 
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